For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as well as synthetic data to train our model and show competitive performance compared with finely annotated real-world data. Specifically, we propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of the coarsely annotated data, using synthetic data to improve predictions around the boundaries between semantic classes, and using cross-domain data augmentation to increase diversity. Our extensive experimental results on Cityscapes and BDD100k datasets demonstrate that our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget. Also, when used as pretraining, our framework performs better compared to the standard fully supervised setting.
translated by 谷歌翻译
Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
translated by 谷歌翻译
A grand goal in deep learning research is to learn representations capable of generalizing across distribution shifts. Disentanglement is one promising direction aimed at aligning a models representations with the underlying factors generating the data (e.g. color or background). Existing disentanglement methods, however, rely on an often unrealistic assumption: that factors are statistically independent. In reality, factors (like object color and shape) are correlated. To address this limitation, we propose a relaxed disentanglement criterion - the Hausdorff Factorized Support (HFS) criterion - that encourages a factorized support, rather than a factorial distribution, by minimizing a Hausdorff distance. This allows for arbitrary distributions of the factors over their support, including correlations between them. We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks, even under severe training correlations and correlation shifts, with in parts over +60% in relative improvement over existing disentanglement methods. In addition, we find that leveraging HFS for representation learning can even facilitate transfer to downstream tasks such as classification under distribution shifts. We hope our original approach and positive empirical results inspire further progress on the open problem of robust generalization.
translated by 谷歌翻译
语义图像合成可以通过允许对正在生成的内容进行指导来控制无条件图像的生成。我们从有条件地从预先训练的自动码图像的矢量量化模型(VQ模型)合成潜在空间。我们发现,共同学习调节和图像潜伏期可以显着提高变压器模型的建模能力,而不是在分别学习的条件潜在和图像潜在的潜在的潜在潜在和图像潜伏期上训练自回旋变压器。尽管我们经过训练的VQ模型在语义和图像潜伏期中都达到了类似的重建性能,但在自动编码阶段将两种模式绑定在一起被证明是提高自动性建模性能的重要组成部分。我们表明,我们的模型使用流行的语义图像数据集ADE20K,CityScapes和Coco-stuff上的自回归模型改进语义图像合成。
translated by 谷歌翻译
最先进的深度学习模型通常经过大量昂贵的标签培训数据培训。但是,需要详尽的手动注释可能会降低该模型在有限标签制度中的普遍性。半监督的学习和无监督的学习提供了有希望的范式,可以从大量未标记的视觉数据中学习。这些范式的最新进展表明,利用未标记的数据来改善模型概括并提供更好的模型初始化的良好好处。在这项调查中,我们从统一的角度回顾了有关半监督学习(SSL)和无监督学习(UL)的最新高级深度学习算法(SSL)。为了对这些领域的最先进的整体了解,我们提出了统一的分类法。我们将现有代表性SSL和UL分类为全面而有见地的分析,以在不同的计算机视觉任务中的不同学习场景和应用中突出其设计理由。最后,我们讨论了SSL和UL的新兴趋势和公开挑战,以阐明未来的关键研究方向。
translated by 谷歌翻译
人类在需要快速传达对象信息的游戏中显示出高级的抽象功能。他们将消息内容分解为多个部分,并以可解释的协议将它们传达。为了为机器提供这种功能,我们提出了基于原始的草图抽象任务,其目标是在预算影响下使用一组固定的绘图原始图表示草图。为了解决这项任务,我们的原始匹配网络(PMN)以自我监督的方式学习了草图的可解释抽象。具体而言,PMN将草图的每个笔划都映射到给定集中最相似的原始性,预测了仿射转换将所选原始词与目标冲程对齐的仿射转换。我们学习了端到端的这一笔触至关重要的映射,当原始草图精确地用预测的原语重建时,距离转换损失是最小的。我们的PMN抽象在经验上取得了素描识别和基于草图的图像检索的最高性能,同时也是高度可解释的。这为草图分析打开了新的可能性,例如通过提取定义对象类别的最相关的原始图来比较草图。代码可在https://github.com/explainableml/sketch-primitives上找到。
translated by 谷歌翻译
视频分类的视听广义零拍学习需要了解音频和视觉信息之间的关系,以便能够在测试时识别出新颖的,以前看不见的类别的样本。可以利用视频数据中音频和视觉数据之间的自然语义和时间对齐,以学习在测试时概括以概括为看不见类的强大表示。我们为音频概括的零拍学习提供了一个多模式和时间跨注意框架(\ modelname)。它的输入是从预先训练的网络获得的时间对齐音频和视觉功能。鼓励该框架专注于跨时间的跨模式对应关系,而不是在模式中的自我注意力,从而显着提高了表现。我们表明,我们提出的框架摄入时间功能会在\ ucf,\ vgg和\ \ \ \ \ \ \ \ \ vistion基准测试基准上获得最新的性能。复制所有结果的代码可在\ url {https://github.com/explainableml/tcaf-gzsl}上获得。
translated by 谷歌翻译
高质量的校准不确定性估计对于众多现实世界应用至关重要,尤其是对于基于深度学习的部署的ML系统。虽然贝叶斯深度学习技术允许估计不确定性,但使用大规模数据集培训它们是一个昂贵的过程,并不总是会产生与非贝斯尼亚对应物竞争的模型。此外,许多已经经过培训和部署的高性能深度学习模型本质上都是非拜拜西亚人,并且不提供不确定性估计。为了解决这些问题,我们提出了贝叶斯cap,该贝内斯cap学习了冷冻模型的贝叶斯身份映射,从而估算了不确定性。 Bayescap是一种记忆效率的方法,可以在原始数据集的一小部分中进行训练,从而通过为预测提供了校准的不确定性估计,而没有(i)妨碍模型的性能和(ii),从而增强了预审预学的非bayesian计算机视觉模型。需要从头开始昂贵的型号。所提出的方法对各种架构和任务不可知。我们显示了我们方法对各种各样的任务的功效,这些任务具有多种架构,包括图像超分辨率,脱蓝色,内化和关键应用,例如医学图像翻译。此外,我们将派生的不确定性估计值应用于在自主驾驶深度估计等关键情况下检测分布样本。代码可在https://github.com/explainableml/bayescap上找到。
translated by 谷歌翻译
基于代理的深度度量学习(DML)通过将图像嵌入与班级代表接近的图像(通常相对于它们之间的角度)来学习深度表示。但是,这无视嵌入规范,该规范可以带有其他有益的环境,例如类或图像 - 内在不确定性。此外,基于代理的DML努力学习课堂内部结构。为了立即解决这两个问题,我们引入了基于概率的非各向异性概率代理DML。我们将图像模拟为高超球的定向von mises-fisher(VMF)分布,可以反映图像内部不确定性。此外,我们为类代理提供了非异向von mises-fisher(NIVMF)分布,以更好地表示复杂的类别特异性方差。为了衡量这些模型之间的代理到图像距离,我们开发并研究了多个分布到分配和分布指标。每种框架选择都是由一系列消融研究激励的,这些研究展示了我们对基于代理的DML的概率方法的有益特性,例如不确定性意识,在培训期间较好的梯度以及总体改善的概括性能。后者尤其反映在标准DML基准测试中的竞争性能中,我们的方法可以进行比较,这表明现有的基于代理的DML可以从更概率的治疗中受益匪浅。代码可在github.com/explainableml/probabilistic_deep_metric_learning上找到。
translated by 谷歌翻译
基于梯度的解释算法何时提供有意义的解释?我们提出了一个必要的标准:它们的特征归因需要与数据歧管的切线空间保持一致。为了提供这一假设的证据,我们介绍了一个基于变异自动编码器的框架,该框架允许估计和生成图像歧管。通过跨各种不同数据集的实验 - MNIST,EMNIST,CIFAR10,X射线肺炎和糖尿病性视网膜病变检测 - 我们证明,功能归因与数据的切线相符,结构化和解释性越多倾向于。特别是,由流行的事后方法(例如集成梯度,SmoothGrad和Input $ \ times $梯度)提供的归因往往比原始梯度更与数据歧管更强烈。结果,我们建议解释算法应积极努力将其解释与数据歧管保持一致。在某种程度上,这可以通过对抗训练来实现,从而可以使所有数据集更好地对齐。必须对模型架构或训练算法进行某种形式的调整,因为我们表明单独的神经网络的概括并不意味着模型梯度与数据歧管的一致性。
translated by 谷歌翻译